Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AWS] Add runtime attributes #6204

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from

Conversation

henriqueribeiro
Copy link

@henriqueribeiro henriqueribeiro commented Mar 1, 2021

Changes:

Copy link
Contributor

@markjschreiber markjschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good although it seems the CI/CD will block you without adding some additional test specs (this is new!?)

@geertvandeweyer
Copy link

geertvandeweyer commented Mar 15, 2021

I compiled and tested this, and it works correctly. As I'm not familiar with java/scala, I cant provide a full review unfortunately. I did notice some warnings when starting cromwell, but as everything works, maybe that's not a problem ?

2021-03-13 12:17:25,630 WARN - Unrecognized configuration key(s) for AwsBatch: auth, numCreateDefinitionAttempts, default-runtime-attributes.awsBatchRetryAttempts, awsBatchRetryAttempts, filesystems.s3.duplication-strategy, numSubmitAttempts, default-runtime-attributes.scriptBucketName

Thanks by the way ! This was exactly what we were waiting for

@henriqueribeiro
Copy link
Author

I just added some documentation.

@henriqueribeiro henriqueribeiro changed the title [AWS] Enable Automated Job Retries in AWS Batch [AWS] Add runtime attributes Apr 14, 2021
@Irsan88
Copy link

Irsan88 commented May 2, 2021

Seems like this is exactly what I need. This branch is not merged yet. Can i clone the repo, checkout this branch and build?

and can you confirm that the right way to set ulimits in the runtime block would e.g. be

runtime { ulimits = [ { "name": string, "softLimit": integer, "hardLimit": integer } ] }

@henriqueribeiro
Copy link
Author

@Irsan88 exactly. Checkout this branch and run the wdl with something like this:

"ulimits": [
      {
        "name": "nofile",
        "softLimit": "9000",
        "hardLimit": "9000"
      }
    ],

Make sure that the softLimit and hardLimit are string and not integers

@wdesouza
Copy link

wdesouza commented Oct 21, 2021

I am trying to build this branch but got the error below. Both local (macOS) and cromwell-dev Docker image.

sbt assembly
[error] /code/wom/src/main/scala/wom/views/GraphPrint.scala:113:57: type mismatch;
[error]  found   : java.util.stream.Stream[String]
[error]  required: scala.collection.GenTraversableOnce[?]
[error]           |  ${digraph.links.toList.flatMap(_.dotString.lines).mkString(System.lineSeparator + "  ")}
[error]                                                         ^
[error] /code/wom/src/main/scala/wom/views/GraphPrint.scala:116:57: type mismatch;
[error]  found   : java.util.stream.Stream[String]
[error]  required: scala.collection.GenTraversableOnce[?]
[error]           |  ${digraph.nodes.toList.flatMap(_.dotString.lines).mkString(System.lineSeparator + "  ")}
[error]                                                         ^
[error] /code/wom/src/main/scala/wom/views/GraphPrint.scala:159:49: type mismatch;
[error]  found   : java.util.stream.Stream[String]
[error]  required: scala.collection.GenTraversableOnce[?]
[error]           |  ${nodes.toList.flatMap(_.dotString.lines).mkString(System.lineSeparator() + "  ")}
[error]                                                 ^
[error] /code/wom/src/main/scala/wom/views/GraphPrint.scala:168:48: type mismatch;
[error]  found   : java.util.stream.Stream[String]
[error]  required: scala.collection.GenTraversableOnce[?]
[error]          |  ${nodes.toList.flatMap(_.dotString.lines).mkString(System.lineSeparator() + "  ")}
[error]                                                ^
[error] four errors found

Any suggestion?

@mcovarr
Copy link
Contributor

mcovarr commented Oct 22, 2021

@wdesouza I am seeing this as well. This fork was created just before #6194 that upgraded Cromwell's Java version from 8 to 11. I think these compilation errors may represent some (hopefully minor) incompatibilities.

@henriqueribeiro
Copy link
Author

@mcovarr @wdesouza I will try to update this pr asap

@wdesouza
Copy link

I've changed _.dotString.lines to _.dotString.linesIterator and it worked. Thanks @henriqueribeiro and @mcovarr

@geertvandeweyer
Copy link

Hi @henriqueribeiro ,

Do you think I can compile this branch with functionality of the latest cromwell release (77) ? I'm now using the branch based on cromwell 58, but more recent versions have retry strategy that's interesting as well:

  • you retry logic handles spot kills
  • cromwell retry handles the fetch_and_run is a directory problem.

=> both would be great, but since it doesn't get approved, I hope to make a new custom build. However, it says here there are conflicts...

Greetings,
geert

@henriqueribeiro
Copy link
Author

Hey @geertvandeweyer. I'm preparing a new release with more functionalities based on cromwell 78. It should be ready in the next few days.

Are you in the cromwell slack?

@ChristianKniep
Copy link

@henriqueribeiro what's the status of this. There's a new PR that seems to do something similar.. :)
#7675
What is left for this to be awesome (had a quick glance, hope I did not miss it) is to define retryStrategies as well, so that you can decide what to do depending on whether the spot instance was reclaimed (infra error) or the script failed (user error).

@geertvandeweyer
Copy link

Hi,

We maintain an aws specific fork that has these functionalities . Spot kills are handled separately from program errors

Check it out here:

https://github.com/henriqueribeiro/cromwell

Or my latest dev release forked from that one :

https://github.com/geertvandeweyer/cromwell

Best,
Geert

@ChristianKniep
Copy link

Any chance this can be upstreamed? An issue we can ask our customers to +1 to motivate that?

@aednichols
Copy link
Collaborator

I'm not entirely sure of how/why the AWS fork came about, but fixing all of these conflicts and upstreaming don't seem like things we'd have capacity for at present.

@markjschreiber
Copy link
Contributor

I have some back story on how the fork came about. For a while I was working with Broad on AWS Batch integration with Cromwell. At some point Broad stopped reviewing PRs for this, probably due to the focus on Terra. For this reason the fork was made. When we supported Cromwell as an option with AGC we maintained this fork however with AGC now deprecated in favor of HealthOmics we no longer maintain the fork. Others wanting to continue using Cromwell on AWS Batch built on that fork. The best option for that case is the https://github.com/geertvandeweyer/cromwell fork.

The AWS HealthOmics service WDL engine is based on miniwdl so we no longer need to make any official merge of this older fork. It is also probably seriously out of date with the HEAD of main. Gert's fork may be more easily merged?

@aednichols
Copy link
Collaborator

Thanks for the update, that is helpful to know.

It may be possible for us to just merge PRs constrained to supportedBackends/aws/ without rigamarole, since we don't use it in Terra.

@geertvandeweyer
Copy link

Hi,

Thanks for that feedback. I would absolutely be in favor of merging the functionality back into the original repo. If we get support if all changes are restricted to the AWS specific files, we can try to achieve that. Right now there are some changes in the general code base as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants